AITopics | image diffusion model

Collaborating Authors

image diffusion model

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

af9ac087ed9123957bb3a45dca56b9d4-Paper-Conference.pdf

Neural Information Processing SystemsFeb-17-2026, 11:37:14 GMT

artificial intelligence, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country:

Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

71c9eb0913e6c7fda3afd69c914b1a0c-Paper-Conference.pdf

Neural Information Processing SystemsFeb-15-2026, 19:45:25 GMT

diffusion model, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country:

Europe > Monaco (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

HeadSculpt: Crafting 3D Head Avatars with Text

Neural Information Processing SystemsDec-23-2025, 22:17:50 GMT

Recently, text-guided 3D generative methods have made remarkable advancements in producing high-quality textures and geometry, capitalizing on the proliferation of large vision-language and image diffusion models. However, existing methods still struggle to create high-fidelity 3D head avatars in two aspects: (1) They rely mostly on a pre-trained text-to-image diffusion model whilst missing the necessary 3D awareness and head priors. This makes them prone to inconsistency and geometric distortions in the generated avatars.

artificial intelligence, machine learning, proceedings, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

RealCompo: Balancing Realism and Compositionality Improves Text-to-Image Diffusion Models

Neural Information Processing SystemsOct-10-2025, 13:21:36 GMT

Diffusion models have achieved remarkable advancements in text-to-image generation.

arxiv preprint arxiv, diffusion model, realcompo, (13 more...)

Neural Information Processing Systems

Country:

Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Vivid-ZOO: Multi-View Video Generation with Diffusion Model

Neural Information Processing SystemsOct-10-2025, 05:54:51 GMT

While diffusion models have shown impressive performance in 2D image/video generation, diffusion-based Text-to-Multi-view-Video (T2MVid) generation remains underexplored.

diffusion model, multi-view video, video, (15 more...)

Neural Information Processing Systems

Country:

Europe > Monaco (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

VENTURA: Adapting Image Diffusion Models for Unified Task Conditioned Navigation

Zhang, Arthur, Meng, Xiangyun, Calliari, Luca, Kim, Dong-Ki, Omidshafiei, Shayegan, Biswas, Joydeep, Agha, Ali, Shaban, Amirreza

arXiv.org Artificial IntelligenceOct-3-2025

Robots must adapt to diverse human instructions and operate safely in unstructured, open-world environments. Recent Vision-Language models (VLMs) offer strong priors for grounding language and perception, but remain difficult to steer for navigation due to differences in action spaces and pretraining objectives that hamper transferability to robotics tasks. Towards addressing this, we introduce VENTURA, a vision-language navigation system that finetunes internet-pretrained image diffusion models for path planning. Instead of directly predicting low-level actions, VENTURA generates a path mask (i.e. a visual plan) in image space that captures fine-grained, context-aware navigation behaviors. A lightweight behavior-cloning policy grounds these visual plans into executable trajectories, yielding an interface that follows natural language instructions to generate diverse robot behaviors. To scale training, we supervise on path masks derived from self-supervised tracking models paired with VLM-augmented captions, avoiding manual pixel-level annotation or highly engineered data collection setups. In extensive real-world evaluations, VENTURA outperforms state-of-the-art foundation model baselines on object reaching, obstacle avoidance, and terrain preference tasks, improving success rates by 33% and reducing collisions by 54% across both seen and unseen scenarios. Notably, we find that VENTURA generalizes to unseen combinations of distinct tasks, revealing emergent compositional capabilities. Videos, code, and additional materials: https://venturapath.github.io

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2510.01388

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

Warped Diffusion: Solving Video Inverse Problems with Image Diffusion Models

Neural Information Processing SystemsMay-27-2025, 13:52:51 GMT

Using image models naively for solving inverse video problems often suffers from flickering, texture-sticking, and temporal inconsistency in generated videos. To tackle these problems, in this paper, we view frames as continuous functions in the 2D space, and videos as a sequence of continuous warping transformations between different frames. This perspective allows us to train function space diffusion models only on **images** and utilize them to solve temporally correlated inverse problems. The function space diffusion models need to be equivariant with respect to the underlying spatial transformations. To ensure temporal consistency, we introduce a simple post-hoc test-time guidance towards (self)-equivariant solutions.

image diffusion model, video inverse problem, warped diffusion, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Neural Assets: 3D-Aware Multi-Object Scene Synthesis with Image Diffusion Models

Neural Information Processing SystemsMay-27-2025, 08:12:44 GMT

We address the problem of multi-object 3D pose control in image diffusion models. Instead of conditioning on a sequence of text tokens, we propose to use a set of per-object representations, Neural Assets, to control the 3D pose of individual objects in a scene. Neural Assets are obtained by pooling visual representations of objects from a reference image, such as a frame in a video, and are trained to reconstruct the respective objects in a different image, e.g., a later frame in the video. Importantly, we encode object visuals from the reference image while conditioning on object poses from the target frame, which enables learning disentangled appearance and position features. Combining visual and 3D pose representations in a sequence-of-tokens format allows us to keep the text-to-image interface of existing models, with Neural Assets in place of text tokens.

3d-aware multi-object scene synthesis, image diffusion model, neural asset, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.99)

Add feedback

Generating time-consistent dynamics with discriminator-guided image diffusion models

Hess, Philipp, Gelbrecht, Maximilian, Schötz, Christof, Aich, Michael, Huang, Yu, Yang, Shangshang, Boers, Niklas

arXiv.org Artificial IntelligenceMay-16-2025

Realistic temporal dynamics are crucial for many video generation, processing and modelling applications, e.g. in computational fluid dynamics, weather prediction, or long-term climate simulations. Video diffusion models (VDMs) are the current state-of-the-art method for generating highly realistic dynamics. However, training VDMs from scratch can be challenging and requires large computational resources, limiting their wider application. Here, we propose a time-consistency discriminator that enables pretrained image diffusion models to generate realistic spatiotemporal dynamics. The discriminator guides the sampling inference process and does not require extensions or finetuning of the image diffusion model. We compare our approach against a VDM trained from scratch on an idealized turbulence simulation and a real-world global precipitation dataset. Our approach performs equally well in terms of temporal consistency, shows improved uncertainty calibration and lower biases compared to the VDM, and achieves stable centennial-scale climate simulations at daily time steps.

artificial intelligence, diffusion model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2505.09089

Country:

Europe > Germany (0.28)
Europe > Austria (0.28)

Genre: Research Report > Promising Solution (0.34)

Industry:

Government (0.46)
Energy (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

StochSync: Stochastic Diffusion Synchronization for Image Generation in Arbitrary Spaces

Yeo, Kyeongmin, Kim, Jaihoon, Sung, Minhyuk

arXiv.org Artificial IntelligenceJan-26-2025

Figure 1: Assorted mesh textures and panoramas generated using StochSync, including one in the background (environment map), which is a 360 panorama. StochSync extends the capabilities of image diffusion models trained in square spaces to produce images in arbitrary spaces such as cylinders, spheres, tori, and mesh surfaces. We propose a zero-shot method for generating images in arbitrary spaces (e.g., a sphere for 360 The zero-shot generation of various visual content using a pretrained image diffusion model has been explored mainly in two directions. First, Diffusion Synchronization-performing reverse diffusion processes jointly across different projected spaces while synchronizing them in the target space-generates high-quality outputs when enough conditioning is provided, but it struggles in its absence. Second, Score Distillation Sampling-gradually updating the target space data through gradient descent-results in better coherence but often lacks detail. In this paper, we reveal for the first time the interconnection between these two methods while highlighting their differences. To this end, we propose StochSync, a novel approach that combines the strengths of both, enabling effective performance with weak conditioning. Project page is at https: //stochsync.github.io/. Diffusion models pretrained on billions of images (Rombach et al., 2022; Midjourney) have demonstrated remarkable capabilities in various zero-shot applications.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2501.15445

Country:

South America > Bolivia (0.04)
North America > United States > Kansas (0.04)
Europe > Italy > Tuscany (0.04)
(3 more...)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.83)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.75)

Add feedback